A Unified Lyapunov Framework for Finite-Sample Analysis of Reinforcement Learning Algorithms
نویسندگان
چکیده
Reinforcement learning (RL) is a paradigm where an agent learns to accomplish tasks by interacting with the environment, similar how humans learn. RL therefore viewed as promising approach achieve artificial intelligence, evidenced remarkable empirical successes. However, many algorithms are theoretically not well-understood, especially in setting function approximation and off-policy sampling employed. My thesis [1] aims at developing thorough theoretical understanding performance of various through finite-sample analysis. Since most essentially stochastic (SA) for solving variants Bellman equation, first part dedicated analysis general SA involving contraction operator, under Markovian noise. We develop Lyapunov we construct novel called generaled Moreau envelope. The results on enable us establish bounds tabular (cf. Part II thesis) when using III thesis), which turn provide insights several important problems community, such efficiency bootstrapping, bias-variance trade-off learning, stability control. main body this document provides overview contributions my thesis.
منابع مشابه
A Unified Analysis of Value-Function-Based Reinforcement Learning Algorithms
Reinforcement learning is the problem of generating optimal behavior in a sequential decision-making environment given the opportunity of interacting with it. Many algorithms for solving reinforcement-learning problems work by computing improved estimates of the optimal value function. We extend prior analyses of reinforcement-learning algorithms and present a powerful new theorem that can prov...
متن کاملA Framework for Aggregation of Multiple Reinforcement Learning Algorithms
Aggregation of multiple Reinforcement Learning (RL) algorithms is a new and effective technique to improve the quality of Sequential Decision Making (SDM). SDM is very common and important in various realistic applications, especially in automatic control problems. The quality of a SDM depends on (discounted) long-term rewards rather than the instant rewards. Due to delayed feedback, SDM tasks ...
متن کاملAlgorithms for Learning Finite Automata from Queries: A Unified View
In this survey we compare several known variants of the algorithm for learning deterministic nite automata via membership and equivalence queries. We believe that our presentation makes it easier to understand what is going on and what the diierences between the various algorithms mean. We also include the comparative analysis of the algorithms, review some known lower bounds, prove a new one, ...
متن کاملA Unified Approach for Design of Lp Polynomial Algorithms
By summarizing Khachiyan's algorithm and Karmarkar's algorithm forlinear program (LP) a unified methodology for the design of polynomial-time algorithms for LP is presented in this paper. A key concept is the so-called extended binary search (EBS) algorithm introduced by the author. It is used as a unified model to analyze the complexities of the existing modem LP algorithms and possibly, help ...
متن کاملA Learning Rate Analysis of Reinforcement Learning Algorithms in Finite-Horizon
Many reinforcement learning algorithms, like Q-Learning or R-Learning, correspond to adaptative methods for solving Markovian decision problems in innnite-horizon when no model is available. In this article we consider the particular framework of non-stationary nite-horizon Markov Decision Processes. After establishing a relationship between the nite-horizon total reward criterion and the avera...
متن کاملذخیره در منابع من
با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید
ژورنال
عنوان ژورنال: Performance evaluation review
سال: 2022
ISSN: ['1557-9484', '0163-5999']
DOI: https://doi.org/10.1145/3579342.3579346